Made by
Github: https://github.com/MarioAuditore/Statistics-energy-based-GAN/tree/main
Disclaimer: if gif doesn't play in notebook, just visit the provided links (all animations are uploaded on github).
GAN - is a family of generative models defined through a minimax game between generator $G$ and discriminator $D$. $G$ takes a latent code $z$ from a prior distribution $p(z)$ and produces a sample $G(z)\in X$. The discriminator takes a sample $x \ in X$ as input and aims to classify real data from fake samples produced by the generator, $p_d$ denotes the true distribution and $p_g$ denotes an implicit distribution induced by prior and $G$. \ Standard non-saturating training objective for the discriminator is:
$$ L_D = - \mathbb{E}_{x\sim p_{data}}[\log D(x)] - \mathbb{E}_{x\sim p_z}[\log (1 - D(G(z)))] $$ And for generator: $$ L_G = \mathbb{E}_{x\sim p_z}[\log (D(G(z)))] $$
Boltzman distribution $p(x) = \exp \left(\frac{-E(x)}{Z} \right)$, where $x\in X$, $X$ in the state space, $E(x) : X \rightarrow \mathbb{R}$ is the energy function defines EBM.
$$ p_d^* = \frac{p_g (x)\exp \left (\frac{p_g(x)}{p_d(x)}\right)}{Z_0}$$
Therefore if $D^* = D$ then $p_d^* = p_d$ and it corrects the bias in the generator via weighting and normalization.
The original paper describes several ways of sampling. In our experiments we used Discriminator Langevin Sampling.
Energy functional is defined as $$E(z) = -\log p_0(z) - D(G(z)),$$ where $z$ - noise, $D$ - discriminator and $G$ - generator.
The sampling algorithm is defined as follows:
Algorithm:
--------------------
Input: N, eps > 0
Output: z_N ~ p_t(z)
--------------------
Sample z ~ p_0(z)
for i in range(N):
n_i ~ Normal(0, 1)
z_{i+1} = z_i - ε/2 * ∇_z E(z) + n_i * √ε
end for
The idea can be developed further and other sampling techniques based on energy functional can be applied.
We conducted experiments on models and datasets from the paper. Also, we tried to use one of the scores presented in the paper - Frechet Inception Distance (FID). It uses pretrained classification model to classify both real and generated images. FID takes last layer representations of given images and compares them. The better the GAN, the closer these representations will be. In our case we used FID from library torcheval (link).
Experiment on github: https://github.com/MarioAuditore/Statistics-energy-based-GAN/blob/main/Celeba-GAN.ipynb
Celeba is a popular dataset of celebrities' faces. Our idea was to find pre-trained generator and discrimantor models and to apply sampling techniques to them. First of all, here is the default generation of faces from noise (numbers above are discriminator scores):
For this batch FID = 301.8684
Then we apply Langevin Dynamics Sampling to the noise generated in latent space. We experiment with two hyperparameters:
eps stand for step sizeN stand for number of sampling iterationsBelow are results for different N and eps:
N = 20 | eps = 1e-1 | FID = 293.1780
N = 20 | eps = 1e-3 | FID = 296.0798
N = 20 | eps = 1e-5 | FID = 306.0943
N = 20 | eps = 1e-7 | FID = 304.4666